-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Merge the pytorch-inference feature branch #73660
Conversation
Initial start/stop trained model deployment actions.
Adds the model_type field to TrainedModelConfig for distinguishing between models that can be loaded via the model loading service and those that require a native process.
This adds a temporary API for doing inference against a trained model deployment.
Introduces code for re-assembling the individual chunks a model is stored in and streaming those chunks to the inference process. Re-uses the TrainedModelDefinitionDoc format already defined for boosted tree models
Binary data is stored in lucene base64 encoded, the same data stored in a Java string uses 2 bytes (UTF16) to represent each base64 character consuming twice the amount of memory required. The compressed binary representation of the models can stored in ByteReferences more efficiently. For BWC a new field mapping binary_definition is added .ml-inference-* and the index version incremented.
This adds a location field to TrainedModelConfig for large models that cannot be PUT inline with the config. Large models are reassembled from their location.
Adds tokenisation for BERT models via the WordPiece algorithm using the vocabulary that defined with the model and introduces the concept of NLP tasks. Each task is configured with a BERT model supporting that task, pre-processing and post-processing is defined by the task. Named Entity Recognition and Fill Mask are the 2 task types supported by this PR
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/clients-team (Team:Clients) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early review comments, very excited for this functionality.
rest-api-spec/src/main/resources/rest-api-spec/api/ml.start_deployment.json
Outdated
Show resolved
Hide resolved
rest-api-spec/src/main/resources/rest-api-spec/api/ml.start_deployment.json
Outdated
Show resolved
Hide resolved
rest-api-spec/src/main/resources/rest-api-spec/api/ml.stop_deployment.json
Outdated
Show resolved
Hide resolved
Thanks for jumping in with an early review @sethmlarson
👍 This makes sense to me I'll raise it with the team I've missed out the spec of the These APIs may be in flux for a short while as we work through all the use cases. Is that a problem for the clients team? Would you prefer us to tell you when we settled on something we like? |
@davidkyle It's no problem for us that these APIs may change especially if they're experimental/on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from an API spec perspective 🎉 One comment I was unsure about.
rest-api-spec/src/main/resources/rest-api-spec/api/ml.infer_trained_model_deployment.json
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The feature branch contains changes to configure PyTorch models with a
TrainedModelConfig
and defines a format to store the binary models. The_start
and_stop
deployment actions control the model lifecycle and the model can be directly evaluated with the_infer
endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask.The feature branch consists of these PRs: